Process-Oriented Planning and Average-Reward Optimality

نویسندگان

  • Craig Boutilier
  • Martin L. Puterman
چکیده

We argue that many AI planning problems should be viewed as process-oriented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goal-oriented. The full power of Markov decision models, adopted recently for AI planning, becomes apparent with process-oriented problems. The question of appropriate optimality criteria becomes more critical in this case; we argue that average-reward optimality is most suitable. While construction of averageoptimal policies involves a number of subtleties and computational difficulties, certain aspects of the problem can be solved using compact action representations such as Bayes nets. In particular, we provide an algorithm that identifies the structure of the Markov process underlying a planning problem – a crucial element of constructing average optimal policies – without explicit enumeration of the problem state space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Average-Reward Reinforcement Learning Algorithm for Computing Bias-Optimal Policies

Computing Bias-Optimal Policies Sridhar Mahadevan Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620 [email protected] Abstract Average-reward reinforcement learning (ARL) is an undiscounted optimality framework that is generally applicable to a broad range of control tasks. ARL computes gain-optimal control policies that maximize the expected pa...

متن کامل

Denumerable controlled Markov chains with average reward criterion: Sample path optimality

We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which uses expected values of rewards, we focus on a sample path analysis of the stream of states...

متن کامل

A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes y

Since the long-run average reward optimality criterion is underselective, a decisionmaker often uses bias to distinguish between multiple average optimal policies. We study bias optimality in unichain, nite state and action space Markov Decision Processes. A probabilistic approach is used to give intuition as to why a bias-based decision-maker prefers a particular policy over another. Using rel...

متن کامل

Bounded Parameter Markov Decision Processes with Average Reward Criterion

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we pro...

متن کامل

Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995